1 research outputs found

    A study of the scale-invariant feature transform on a parallel pipeline

    Get PDF
     Untitled Page In this thesis we study the running of the Scale Invariant Feature Transform (SIFT) algorithm on a pipelined computational platform. The SIFT algorithm is one of the most widely used methods for image feature extraction. We develop a tile based template for running SIFT that facilitates the analysis while abstracting away lower-level details. We formalize the computational pipeline and the time to execute any algorithm on it based on the relative times taken by the pipeline stages. In the context of the SIFT algorithm, this reduces the time to that of running the entire image through a bottlenecked stage and the time to run either the first or last tile through the remaining stages. Through an experimental study of the SIFT algorithm on a broad collection of test images, we determined image feature fraction values, that relate the sizes of the image extracts as it the computation proceeds through the stages of the SIFT algorithm. We show that for a single chip uniprocessor pipeline, the computational stage is the bottleneck. Specifically we show that for an N x N image with n x n tiles the overall time complexity is θ ( (n+x)2 pi Г0+αβN2x2 Г1+ (αβ+γ)n2logx P0 Г2 ) ; here x is the neigborhood of the tile, pi , po are the number of input, output pins of the chip, α,β,γ are the feature fractions, and Г0, Г1, Г2 are the input, compute, output clocks. The three terms in the expression represents the time complexities of input, compute and output stages. The input and output stages can be slowed down substantially without appreciate degradation of the overall performance. This slowdown can be traded off for lower power and higher signal quantity. For multicore chips, we show that for an N x N image on a P-core chip, the overall time complexity to process the image is θ ( N2 pi Г0+ (n2w2 + αβn2x2) P Г1+ (αβ+γ)n2logx P0 Г2 ) ; in addition to the quantities described earlier w is the window size used for the Gaussian blurring. Overall we establish that without improvements in the input bandwidth, the power of multicore processing cannot be used efficiently for SIFT
    corecore